Some results on random design regression with 
long memory errors and predictors 

Rafal Kulik 

Department of Mathematics & Statistics, University of Ottawa 
585 King Edward Avenue, Ottawa ON KIN 6N5, Canada 
email: rkulik@uottawa.ca 

Pawel Lorek 
Mathematical Institute, University of Wroclaw 
pi. Grunwaldzki 2/4, 50-384 Wroclaw, Poland 
email: lorek(Qlmath. uni.wroc.pl 

June 14, 2010 

Abstract 

This paper studies nonparametric regression with long memory (LRD) 
errors and predictors. First, we formulate general conditions which guar- 
antee the standard rate of convergence for a nonparametric kernel esti- 
mator. Second, we calculate the Mean Integrated Squared Error (MISE). 
In particular, we show that LRD of errors may influence MISE. On the 
other hand, an estimator for a shape function is typically not influenced 
by LRD in errors. Finally, we investigate properties of a data-driven band- 
width choice. We show that Averaged Squared Error (ASE) is a good ap- 
proximation of MISE, however, this is not the case for a cross-validation 
criterion. 

1 Introduction 



Consider the random design regression model, 

Y, = m{Xi) + Si, i n, 

where Xi, Ei, i = are two mutually independent sequences of ran- 

dom variables. We investigate the problem of estimating function m(-). This 
problem is quite well understood for weakly dependent data. Also, in the last 
two decades, it has received a lot of attention in case of long range dependence 
(LRD). In this situation most of results have been obtained under quite spe- 
cific assumptions on the errors and/or predictors. In particular, it is typically 
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assumed that both sequences are infinite order moving averages, or they are de- 
fined as (nonlinear) functionals of Gaussian sequences. However, in the recent 
years different authors proposed many new (nonlinear) LRD models. Although 
it is reasonable to assume some structure on the observable predictors, particular 
assumptions on the errors are almost impossible to verify. Therefore, one of our 
goals is to state general conditions, which guarantee appropriate limit theorems. 

In Section 2.2 we discuss central limit theorem for the Nadaraya- Watson 
estimator of m(-). As it is well known, for LRD data we have a dichotomous 
behaviour: if a bandwidth h is small, then the rate of convergence is ^/nh, 
the same as in i.i.d. case. Otherwise, if bandwidth is large, long memory con- 
tributes. We refer the reader to [17] for the most general result; see also [4], 
[6], [7]. We state general conditions, which guarantee Vnh rate of convergence. 
These conditions can be easily verified for (subordinated) linear LRD processes, 
FARIMA-GARCH models, stochastic volatility models (including LARCH), as 
well as for antipersistent errors. In particular, if e^, i > 1, is a linear process 
such that Var(^"^-^ei) ^ Cn?~"', a € (0,1), then Vn/i-consistency holds if 
hn^~°' — > 0. This agrees with previous results. (Here and in the sequel, C is a 
generic constant). On the other hand, however, if the errors are described by a 
stochastic volatility, then V n/i-consistcncy always holds. 

To verify that the condition hn^^^" — > holds, we need to know the param- 
eter a. However, random variables £i, i = 1, . . . ,n, are not directly observable, 
and a performance of various estimators of a is not clear. Therefore, we will 
modify our estimation problem. In some cases, for a purpose of an exploratory 
data analysis, it suffices to estimate a shape function, m*{x) = m(x) — J mf. 
As indicated in [8], [20] and [16], LRD of errors does not influence estimation of 
m*(-). This effect is proven here for the Nadaraya- Watson estimator. In fact, 
for the linear LRD processes mentioned above, we obtain V n/i-consistency if 
the predictors are independent and h^v}~°' — ^ 0, which is much weaker then 
the previous condition. However, this approach does not work in case of LRD 
predictors. 

Next, we investigate properties of the Mean Integrated Squared Error (MISE). 
We assume that £i, i > 1, is the linear process as mentioned above. We show 
that the optimal MISE has a dichotomous behaviour: if the memory parameter 
a is greater than 4/5, then the rate of convergence is n~^^^, the same as in 
for i.i.d. errors. However, if a < 4/5, then the rate of convergence is 
Interestingly, a possible LRD of predictors does not influence the asymptotic 
behaviour of MISE. Similar results were obtained for density estimation, see 
[10]. On the other hand, in a fixed-design case, LRD always influences the rates 
of convergence. For details we refer to [ 1 1]. 

To reduce the influence of LRD on MISE, we may consider the shape func- 
tion. It is shown that for independent predictors, the Mean Integrated Squared 
Error corresponding to m*, has the same asymptotic behaviour as in case of 
i.i.d. errors. In other words, from expected-risk point of view, long memory 



2 



does not influence estimation of the shape function. We also note in passing 
that the optimal bandwidth choice for the shape estimation agrees with the 
optimal one for the estimation of the original function, as long as a > 2/5. 

With help of formulas for MISE, we obtain the optimal bandwidths. As 
usual, they are not quite practical, since they involve unknown parameters and 
a data-driven method has to be used. As argued in [12], a plug-in method 
has some advantages over cross-validation. However, let us note that the opti- 
mal bandwidth is of the form Cn~^^^ if a > 2/5, and Cn~'^~"-'/^, otherwise. 
Consequently, we have to know a to be able to construct an appropriate plug-in 
bandwidth selector. Therefore, we focus on cross-validation. Let us indicate first 
that cross-validation is the valid procedure in a fixed-design case. The procedure 
produces a bandwidth which is close to the optimal one, and cross-validation 
criterion itself is a good approximation to MISE. The reader is referred to [12]. 
In the density estimation case, however, cross-validation is a good approxima- 
tion to MISE if and only if a > 4/5, see [13] and [5] as well as discussion in 
Section 2.4 for more details. 

We will show that for random-design regression, the empirical minimizcr of 
the cross-validation criterion is a good approximation to the optimal /i, however, 
the cross-validation itself is the valid procedure for a > 4/5 (which agrees with 
findings in [13] and [5]). 

The paper is organized as follows. In Section 2.1 we collect assumptions 
and define estimators. Sections 2.2, 2.3 and 2.4 contain results on central limit 
theorem, mean square error and bandwidth choice, respectively. In Section 3 we 
illustrate our findings by simulations. Next, in Section 4 we present examples 
of time series, where it is possible to verify our conditions. Finally, the proofs 
are presented in the last section. 

2 Results 

2.1 Assumptions and estimators 

We will consider the following assumptions on the predictors X;, z > 1: 

(PI) Xi, i > 1, is a sequence of i.i.d. random variables. In this case, let 
= cr{Xi, . . . , Xi) 

(P2) Xi, i > 1, is the infinite order moving average 

oo 

X = ^ flfcO-fei ao = 1, 

fe=0 

where ^oo < i < oo, is the sequence of centered, i.i.d. Gaussian 
random variables and for ax S (0,1), Ofe = ylofc~("'^'+^'/^ for some Aq. 
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Consequently, Xi are Gaussian and they are assumed to have unit vari- 
ance. In this case we denote Xi = a{Q, Cz-i, ■ • ■)■ Furthermore, note that 
Var -^i) ^ A\n?~°'^ , where Ai is a finite constant. 

We note that from the point of view of our resuhs stated below, (PI) can be 
treated as the special case of (P2), by plugging- in ax = 1. Thus, the results 
which are stated under (P2) assumption are valid also under (PI). 

Under (PI), we do not need to assume a particular structure of errors £i, 
i> 1. The general assumption is 

(EO) £i ~ G{i]i,r]i-i, . . .), i > 1, where iji, — oo < i < oo is an i.i.d. sequence, 
independent of Xi, i > I. We also assume that E[ei] = 0, E[e^] = 1. 
It assures that e^, i > 1, is a stationary ergodic sequence. Denote Hi = 
ct(7?,,77j_i,...). 

Under (P2) wc will assume that 

(E2) £i, i > 1, is the infinite order moving average 

oo 

= ^ CkVi-k, Co = 1, 

k=0 

where i]i, — oo < i < oo, is the sequence of centered, i.i.d. random variables 
with a finite fourth moment, E[£^] = 1, independent of Xi, i > 1. We 
also assume that for a G (0,1), Ck ~ Cofc^'^"+^^/^, as fc — oo. Then 
Var (X]"=i ^i) ^ Cfn^^", where Ci is a constant. 

We consider the classical Nadaraya- Watson estimator 

where a nonnegative and bounded kernel K fulfills the usual conditions: 

J K{u)du — 1, j uK{u)du — 0, K2 '■— J v?'K{u) du. 

For a future use we denote Kh{-) = K{-/h) and ki ;= / K^{u)du ^ 0. The 
bandwidth h = hn fulfills the usual conditions /i — > and nh — > oo. Further- 
more, fh{x) is the standard kernel estimator of the density / of Xi. 
Define further the shape function 




and its estimator m* ^ rfi — J mf, where 

i—l 
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Note that the latter is the unbiased estimator of / to/, i.e. 



1 " r 

-^E[r,] = E[TO(Xi)]= / 

i—1 



mf. 



Finally, we will assume that / and to are twice differentiable with bounded 
derivatives and that /(.t) > for each x. 

2.2 Central limit theorems 

Throughout this section we assume that (EO) holds. Let us formulate the fol- 
lowing conditions: 

^Y.^Y.[sm~l]-e^) = op{l). (A) 

2^1 



h 

" ^E[e,|H.-i] =op(l). (Bl) 



^^^E[e,|H.-i] =op(l). (B2) 

i=l 

;j5^1~ax ^ 0. (CI) 

/in^-"'^ ^ 0. (C2) 

Define 

TO„(x-) E[/v ((x-Xi)//i)to(Xi)]/E[/v ((x-Xi)//i)], ml{x) = m„ixy J mf. 
Clearly, the bias is 

9fm"(x) m'(x)f"(x)\ f n , , 
bias(x) = mn{x) - m{x) - \ ~2' ^ — f{x) J J ^ ^'^""^ '^"^ 

Proposition 2.1. Assume (PI) and (EO). Under the conditions (A) and (Bl) 
we have 

\[^JJ{x){m{x)-mn{x))^N{Q,l). (3) 
V Ki 

Assume (P2) and (E2). Under the conditions (A), (Bl) and (CI), the asymp- 
totics (3) holds. 

Remark 2.2. Note that there is no symmetry between LRD assumptions on Si 
and Xi. For example, assume that Si, and thus E[ei|'Hi-i] is the linear process 
with Var(^"^^E[ei|-H,-i]) - C^n^'", a € (0,1); see Example 4.1. Then (Bl) 
holds if hn^~"' —J- 0, whereas the assumption for ax requires h^n^~°'^ — >■ 0. 
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Proposition 2.3. Assume (P2). Under the conditions (A), (B2) and (C2) we 
have 



Remark 2.4. The result under (PI) follows by taking ax = 1 in (C2). In 
this case the condition is always fulfilled. The difference between the estimators 
of rn(-) and m*(-) appears by comparison of (Bl) and (B2). In case of m*(-) 
much larger bandwidths are allowed to achieve classical rates of convergence. 
The additional condition (A), required for to*(-) estimation is typically easy to 
verify, even for long memory or non-stationary sequences. 



Recall from the previous remark that LRD in predictors basically does not 
matter. This is not the case for the shape estimation. Assume that hn^'"^ — )■ 
CO, but (A), (B2), (CI) and 



hold. For example, this happens when Si, and thus E[ei|Hi_i] is the linear 
process with Var (^"^^ E[£,|-H,-i]) - Ci^n^"", a e (0,1), and h^n^-'' 0. It 
follows from the proof of Proposition 2.3 that 



2.3 Mean Square Error 

In this section we establish asymptotic formulas for mean integrated squared 
error for both m(-) and m*(-). In particular, it will be shown that we may im- 
prove the rates of convergence, if we estimate the shape function instead of m(-). 

Consider the following weighted versions of the mean integrated squared 
errors: 



where r(-) is a weight function and integrals are taken over a support of /. 
Proposition 2.5. Assume (P2) and (E2). Then we have 
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Remark 2.6. The first term in MISEr(/i) describes i.i.d. type behaviour, the 
second one is due to bias. The terms involving n~" describe a possible contribu- 
tion of long memory. Note that we have to include the term h?n~"^. These terms 
do not have influence on the optimal behaviour of MISE, but they influence /lopt, 
the optimal bandwidth choice. Indeed, one can verify that 



ha 



CtT^I'^ if a > 2/5; 

°P' ^ \ if a < 2/5, 



so that MISEr(/iopt) is proportional to n ^1'^ if a > 4/5 and n if a < 4/5. 
Note also that there is no contribution of LRD of the predictors. 

Proposition 2.7. Assume that 

Var (^^{E[e,\n^-i] - e,)^ = 0{n) (5) 
holds. Under (P2) and (E2) we have 



Remark 2.8. The condition (5) can be verified for many time series, including 
time series with long memory (see Section 4). The result under (PI) can be 
obtained by taking ax = 1; then the last term is negligible. Consequently, 
under (PI) we remove long memory of errors, however, under (P2) there is an 
additional term due to long memory of predictors. 

Under (PI) the optimal bandwidth, /i*pt, is proportional to yielding 
MISE;(/i*pJ^Cn-4/5. 

Remark 2.9. Hall and Hart, [11], were the first who proved the mean squared 
error behaviour in case of fixed-design regression. The meaning of their results 
is that LRD in errors always influences estimation of the conditional mean. 

On the other hand, in case of kernel density estimation based on LRD data 
£!,...,£„, Hall and Hart [iO] showed a similar dichotomous behaviour, as de- 
scribed in Proposition 2.5. 



2.4 Empirical bandwidth choice 

In this section we study properties of empirical bandwidth selector procedures. 
We shall focus on the case of i.i.d. predictors, to show an influence of LRD errors 
on the empirical risk. We will work also under the two additional assumptions: 
E[m^(Xi)] < oo and that / has bounded support. This will simplify some 
computations and allow us to write MISE/(/i). Let 

n 

ASE(/i) = - y (mhiX,) - m{X,) f 
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be the Averaged Squared Error. 

First, we answer the question, whether minimization of ASE leads to a vahd 
minimizer. The answer is affirmative: the meaning of (6) is that the quotient 
of h, the minimizer of ASE, and the minimizer of MISEj converges to 1 in 

probabihty. Furthermore, ASE(/i)/MISE(/iopt) ^ 1- 

Proposition 2.10. Assume that f,m G . Let Bi < B2 be finite and positive 
constants. Under (PI) and (E2) we have uniformly over [-Bi/ioptj -B2/iopt]j 

ASE(/i) - MISE/(/i) = op (MISE/(/iopt)). (6) 

However, what we are interested in from a practical point of view, is, first, 
whether a cross-validation produces a valid bandwidth, and, second, whether a 
cross-validation is a good approximation to ASE and MISE. To answer this, 
let TJij^hi') be the version of the estimator (1), where the summation is over 
i ^ -fj"(Oi where Ij :— {i : \j — i\ > I}- Empirical cross-validation bandwidth is 
obtained via minimizing 

1 " 
n ^ — ' 

Denote CV(/i) = CNq(K). Note that if both predictors and errors are i.i.d., then 

E[CV(/i)] E[ASE(/i)] + E[e2] ^ MISE/(;i) + Eie?], 
i.e. in the average sense QN(K) is the exact approximation of MISEy (/i) -|-E[e^]. 

The result for LRD data is as follows. 

Proposition 2.11. Assume that f^mEC^. Let Bi < B2 he finite and positive 
constants. Under (PI) and (E2) we have uniformly over [-Bift-opt, B2ft.opt], 

cv(fe)-MiSE,(fe)-iE:Li--f _ 1 ly-. ^ 

MISE/(/jopt) MISEy(/iopt) n2 ^ 

Let us comment on the above result. For a fixed-design case, under appro- 
priate conditions on we have (see [12]) 

n 

GY[{h) - MISE'(;i) - - E^' = op[n-^'''% (7) 

uniformly over [Cn-"/^, C'n-"/^], C < C". (Here, CYiih) and MISE'(/i) are 
defined in a slightly different way, to accommodate fixed-design). Note that 
the rate of MISE'(/iQpj) and n "^'^ is asymptotically proportional to 
/i' J, where the latter is the asymptotically optimal bandwidth in the fixed 
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design regression. This means that the ratio of the bandwidth obtained by 
cross- vahdation and the MISE optimal bandwidth converges to 1 in probability. 
Also, E[CV;(/i;,pJ] provides a valid approximation to MISE'(/i^pJ +E[ef]. (The 
last statement is intuitive only, since the rate at the right hand side of (7) is in 
probability rather than in L^). 

Now, from Proposition 2.11 we conclude that in the random-design regression 

- P 

hey, the minimizer of CV(/i), has the property /icv/^opt ~^ 1- However, CV{h) 
itself provides a valid approximation only if a > 4/5. This agrees with the results 
in [I."-)] in case of density estimation. 

3 Numerical studies 

Simulation studies were conducted as follows: 

1. We set h. 

2. Simulate 100 errors Si from ARIMA with different LRD parameters d. Note 
that d = (1 — a)/2 and i.i.d. case corresponds to d = 0. We used R-package 
f racdif f . 

3. Simulate 100 predictors following standard normal random variables. First, 
we simulate i.i.d. predictors, then LRD predictors with dx '■= (1 — 
ax)/2 = 0.3. 

4. This procedure was repeated 500 times. 

5. As the output we get a Monte Carlo approximation to the MISE and 
MISE*. 

Table 1 shows results for both m{-) and for the function m{x) = sin(27rx) 

and bandwidths h = 0.05, h = 1, respectively. Note that in this case m = m*. 
Even for a relatively small sample size, we may observe that MISE* for the 
shape function remains constant for either choice of the bandwidth (h ~ 0.05 or 
h = 1). On the other hand, for the small bandwidth h = 0.05 we observe that 
MISE for m(-) stays constant up to d = 0.25, but it grows almost immediately for 
h — 1. Furthermore, LRD starts to dominate earlier for the larger bandwidth. 
This is in line with Propositions 2.5 and 2.7. Next (Table 2), we repeated this 
experiment with dependent predictors, choosing dx (1 ~ ctx)/'^ = 0.3. By 
comparing both tables, note that there is a little influence of LRD of predictors 
on MISE for m(-). This is still in line with the Proposition 2.5. On the other 
hand, for m*(-) estimation. Proposition 2.7 suggests that LRD of predictors 
should contribute. It does not seem to be the case here, however, or simulation 
studies suggest that MISE* depends on the constant F,'^[m{Xi)Xi], as indicated 
by our theoretical results. 

Finally, based on 500 simulations, we computed averaged values of CV crite- 
rion with optimally chosen h. Table 3 indicates that LRD influences CV almost 
immediately, which is once again in line with our theoretical results. 
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d 


h = 


0.05 


h = 


= 1 




MISE 


MISE* 


MISE 


MISE* 





0.3995014 


0.3983940 


0.5087492 


0.5016983 


0.05 


0.3932386 


0.3870820 


0.5104933 


0.5011370 


0.10 


0.3859548 


0.3791453 


0.5136161 


0.5042601 


0.15 


0.3780757 


0.3724782 


0.5163214 


0.5030991 


0.20 


0.4095343 


0.3910381 


0.5255956 


0.5026314 


0.25 


0.4050078 


0.3850845 


0.5322871 


0.5027839 


0.30 


0.4242228 


0.3794874 


0.5494079 


0.5019103 


0.35 


0.4493550 


0.3940342 


0.5975138 


0.5044394 


0.40 


0.5848862 


0.3805059 


0.6432639 


0.5016800 


0.45 


0.8899494 


0.3822220 


0.8908165 


0.5038973 



Table 1: MISE for some values of the dependence parameter d and i.i.d. predictors. 



d 


h = 


0.05 




MISE 


MISE* 





0.4021701 


0.3963913 


0.05 


0.4454991 


0.4385924 


0.10 


0.4337356 


0.4227792 


0.15 


0.4281452 


0.4225283 


0.20 


0.4262649 


0.4121518 


0.25 


0.4436659 


0.4137239 


0.30 


0.4417674 


0.4049835 


0.35 


0.4948932 


0.4307672 


0.40 


0.5673097 


0.4146042 


0.45 


0.8217908 


0.4258788 



Table 2: MISE for some values of the dependence parameter d and LRD predictors. 



4 Examples 

In this section we present some examples, which show that our conditions are 
easy to verify for very different long memory processes. 

Unless specified otherwise, {Z,Zi,i > 1} and {77,77^,-00 < i < 00} will be 
sequences of centered i.i.d. random variables with E[Z|] ~ Elrif] = 1. More 
detailed description of most of the models below, together with stationarity and 
moment conditions, can be found in [!)]. 

Example 4.1 (Linear processes). Assume (E2). Then we have 

00 

E[e^\n^^l] - Cfc'^'-'^ + ^V^\n^^l] ^- £^,^-l + (8) 

k=l 
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d 


CV 





0.2912500 


0.05 


0.2921772 


0.10 


0.2960790 


0.15 


0.2972158 


0.20 


0.310086 


0.25 


0.3206689 


0.30 


0.3415160 


0.35 


0.3687273 


0.40 


0.4133771 


0.45 


0.4663486 



Table 3: CV for some values of the dependence parameter d and i.i.d. predictors. 



and {ei^i-i,i > 1} is LRD linear process with (up to a constant) the same 
limiting behavior as {ei,i > 1}. Consequently, 

n n 

Y,me^\n^^l]-e^} = -Y,V^, (9) 
1=1 1=1 

and 

n n 

^E[e,|H.-i] =^£.M-i- (10) 

1=1 i=l 

By (9) and since rji are i.i.d., (A) is automatically fulfilled. Finally, by (10), we 
conclude that (B2) holds if 

/i^n^-" ^ 0. (11) 
Note that this condition is much less restrictive than 

hn^-°' 0, (12) 

which is required for (Bl) to hold. The latter condition is the same as in [19]. 

Example 4.2 (Functionals of Linear Processes). Consider the linear process 
from Example 4.1. Let T be a twice differentiable functional and let 



vfc=o 



Consider the same assumptions as in Example 4.1. Additionally, we assume 
that E[r(£i)] = 0. 

Let frj be the density of r^i. Then, by considering two terms of Taylor 
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expansion (which is enough for a quadratic functional), 



E[e,|H,_i] = J Tiu + e,,,^i)fr,{u)du 

T{u)fr^{u)du + Sis-^i J T'{cou)f,j{u)du+^eli_i Jt {u)f,^{u)du 
= E[r(77i)] + E[T'(77i)]e,,„i + ^E[t" ivi)]el_, 

and 

£, = T(7?,) + r'(r;,)£,:,.-i + ^t" {rj,)sl_,. 

For simplicity take T(w) ~ — ^[Vi + o] ^"^^ assume that the density is 
symmetric. Then E[T(77i)] = E[e2^_J = '-Y.V=i4^ E[r'(c77i)] = 2E[/7i] = 0. 
Consequently, 

Var E[£,|H.-i]^ = Var {sl_, - E[el_,])^ ^ ©(n^-^" v n). 

We conclude that (B2) holds for aU a > 1/2 or if ri^^^"/i^ 0, whereas (Bl) 
holds when a > 1/2 or if n^~^°'h — >• 0. 
Moreover, 

E[ei|'Hi_i] - Ei = -rji - 2r]i£i^i-i 

and thus (A) holds since r]i,i> 1 are i.i.d. and rjiSi^i-iji > 1 are uncorrelated. 

Similar consideration can be carried out for any functional T, in particular, 
for T{u) ^ \uf - E[|ei|'']. The set of parameters for which (B2) and (Bl) hold 
depends on the so-called power rank of T (see [14] for more details). If the power 
rank is 1, the (B2) and (Bl) hold for a, h as in Example 4.1, if the power rank 
is 2, then the conditions are fulfilled for a, h as in case of quadratic functional 
discussed above. 

Example 4.3 (FARIMA-GARCH processes). Assume that 

= {1 - B)-U~\B),P{B)7^,, 

1 /2 

where rji — Zih- and hi is GARCH(r, s), 

r s 

hi = ao + ^ aji]f_j + ^ Pkhi-k- 
j=i k=i 

Here, B is the backshift operator, ^ and (p are polynomials in B and d € 
(—1/2, 1/2). Note that under appropriate stationarity conditions the FARIMA- 
GARCH process can be written as the linear process in (E2), where ~ Ck~^ , 
(3 = 1 — d (we refer to [l] for more details). 

LetH,_i =(t(77„Z„77,_i,Z,_i...). ThenE[r],\n^-i] = E[Z,]E[hy^\m-i] = 
0. Consequently, as in Example 4.1 (see (8)), (A) holds for all d g (—1/2, 1/2), 
since 77; are uncorrelated. Furthermore, (B2) and (Bl) hold if n^'^h^ — > and 



n 



2d 



/i 0, respectively, on account of [1, Theorem 3]. 
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Example 4.4 (Antipersistent errors). If in Example 4.3 d S (—1/2,0), then 
we have antipersistence and (B2) is always fulfilled. Consequently, in case of 
antipersistent errors the correct scaling for the estimator of both m and m* is 
always ^/nh. Note that in case of fixed-design regression antipersistency may 
improve convergence beyond i.i.d. rates, see e.g. [1], [2]. 

Example 4.5 (Stochastic volatility). Let T and W be real- valued functional. 
Define 



Let 

e. = r(e*)-E[T(£*)] 

and Hi = (j{rji, Zi, rji-i, Zi-i, . . .). Note that Ri is "Hi-i-mcasurable. For sim- 
plicity, assume that T{uv) — T{u)T{v), which applies e.g. to polynomials 
T{u) = \u\^. Then 



^E[£,|H.-i] = E[T{Z)]J2{nT{R,)\n^-i]-E[T{R,)]} 

i=l 4=1 

= E[T(Z)]f]{T(E,)-E[m)]}- 



i=i 



Thus, if Cfc - fc-("+i)/2, a € (0,1), the conditions for (B2) and (Bl) are the 
same as for nonlinear transformations of linear processes in Example 4.2 by 
substituting T->-ToW. 
Furthermore 

n n n 

{E[e,|H,_i] -e^}^Yl ^(■^') MT{Z)] - T{Z{)} =: ^ T{R,)U,. 

i—1 i—1 i—1 

Note that T{Ri)Ui,i > 1, are uncorrelated, thus (A) is always fulfilled. 

If Ei = e* = ZiRi, then E[ei|'Hi-i] = and since the random variables 
£i,i> 1 are uncorrelated, Yl^=i — Op{y/n) . Thus, the memory parameter a 
has no influence on the asymptotic behavior of neither rhh nor m^. 

Example 4.6 (LARCH processes). Define 

oo 

e* = ZiRi, i?j = a + ^ Cfee*„fc, a>0 

k=l 

and assume that J2T=i '^l < 1' '^fc ^ fc^'"+"^^/^, a G (0, 1). Let 

= r(e:) - E[T(£*)] 
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and T-Li = <j{e*, Zi, e*_i, ^i-i, • ■ •)■ The random variable Ri is ?^i_i-nieasurable. 
As in Example 4.5, assmne that T{uv) = T{u)T{v) so that 

n n n 

i=l i—1 i—1 

and T{Ri)Ui,i > 1 are uncorrelated, thus (A) is always fulfilled. Furthermore, 

n n 

^ E[e,|7^,_i] = E[T{Z)] ^ {T{R,) - E[T(i?,)]} • 

1=1 i=l 

Ahhough this expression has the same form as in Example 4.5, Ri is not a linear 
process based on i.i.d. random variables. Nevertheless, from [3, Theorem 1.1] 
we conclude that if T is twice differentiable, and E[i?iT(i?i)] ^ 0, then the 
scaling factor for the latter sum is the same as for linear processes in Example 
4.1. Thus, for the conditions (B2) and (Bl) to be fulfilled, we need (11) and 
(12), respectively. 



5 Proofs 

In the proofs we apply a concept of martingale approximation and Hermite 
expansion. In the context of nonparamctric estimation the first method was 
introduced in [19] and [17], the latter one is a standard tool in LRD setting, see 
e.g. [18]. 

Let us note that under the regularity assumptions we have: 

E[Kh{x-X,)] = hf{x) + h' ■l^K2 + o{h'), (13) 

Wc may write 

nih{x) - m„(x) = %^ {mh{x) - m„(x)) + (to;,(.t) - m„(x)) ^^^^\, ^^^^^^ . 

f{x) fix) 

(14) 

Since fh is the consistent estimator of /, it suffices to study the first part only. 
Decompose 



fh{x 



) 1 " 

- {rhhix) - m„(.T)) = ■ „, . y^Kh{x~ Xi) {m{Xi) - m„(x)) 
jyx) nnj[x' ^-^ 



1 " 
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The parts Nn{-) and N^{-) are decomposed further as fohows: 

Nnix) = -j^{ni{x) ~ m,n{x)){fh{x) - E[fh{x)]) 
1 " 

+ -7-7^^ (i^h {x - X,) (m(X,) - m(x)) - E[if;, (x - X,) (m(X,) - m(a;))|A',_i]) 
nrij[x) 

n 

+ ^7rT E (Ef^/. (2^ - ^0 ("^(^0 - m{x))\X,^^] - ^Kh (x - X,) (m(X,) - m(x))]) 
nhf{x) ^ 

=: iV„,i(x) + Nn^oix) + Nn,2{x). 
Likewise, 

1 " 

1 " 

nnj[x) 

1 " 
nnjyx) 

Consequently, for rhh{x) — mn{x) we have the foUowing decomposition: 

("1/^(2^) - = N,M + N,,^i{x)+N.n.2{x) + A'U{x) + D,,{x). (15) 

Note that M„(-) and -/Vn.o(-) are always martingales. 

Assume first (PI). Then Nn.2{ ) = 0. Furthermore, 

1 " 
Dn{x) = — — E[/f^(a;-Xi)]^Eh|H,_i]. (16) 
nnj[x) 

Assume now (P2). Let Xi^^^i = E[Xi\Xi^i] = OfcCi-fc- Let /^,(-) be 

the density of Let 7^ = Var(Ari.o), write = Xi^i^if-f and note that 
is independent of Q. Let Hq{-) be the gth Hcrmitc polynomial. Applying the 
Hermite expansion we represent Nn^2{x) as 



where 

L(q;u) = L{q;u,h,^) = E[ii',,(x - (u + 7Z1)) (to(u + 7Z1) - m(a-)) Hg{Zi)]. 
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Also, 



^ oo ^ n ^ 

Dn{x) = -rT^5]-5]i/,(^.0Eh|H.-i] / J{q;u)fc,iu) du, 

J^^' q=Q ^' i=l 

where 

J(g; = J((7; u, h, 7) = E[/^,,(x - + jZ,))Hg{Z,)]. 

Note that summations in Hermite expansions is from q = 0, since the expanded 
functions does not have mean w.r.t. Gaussian density. 

Furthermore, 

Nn,ii^) = j^imix) ~ mnW)^ £ ^H,{X,)^ (17) 

•l^^' ^ g=l i=l 

where 

= Ciq; h) = E[Kh{x - Xi)Hg{Xi)]. 
As for the shape function m*, we write 

TTilix) - m*{x) = {ml{x) - ml{x)) + {m*^{x) - m*(x)) , 

where m*(x) = m„(x) — J mf. Clearly, m*(x) — to*(2;) = ran(x) — m{x). Let 
61 = i X)"=i ©2 = ^ ^i- With this notation we write 



fh{x) , . 
-jj-^ [m^(x) - m 

1 " 

= N^.oix) + Af„(x) + J2 ^h^^ - (/ ™^ - ^1 
1 " 

+ Dn{x) - ~ ®2 + iVn4(a^) + NnAx)- (18) 

nn.j[x) 

The crucial difference between m/i(-) and TO/* (•) is that possibly LRD part Dn{x) 
is replaced with 

1 " 

-Dn(x) - — — Y^hix^ X,) 62 D^[x) - E„ix). 
nn.j[x) 

5.1 Proof of Propositions 2.1, 2.3 

Recall (15). Since Nnfi{x) and Mn{x) are martingales we may easily conclude 
that 

Var (Nnfiix)) = 0{h/n), Var (M„(a;)) - ^^^'^^^ , (19) 
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which means that Nnfiix) is neghgible. Furthermore, the martingale part Mn{x) 
may be studied using standard tools (see the proof below in Section 5.1.1). 

Lemma 5.1. Under the conditions of Proposition 2.1, 

I nh 



V7(^M„(x) 4^(0,1). 

Under (PI), using (2), 

Var[iV„,i(a;)] = 0{h^{nh)-^) = 0{h^/n), 

so that this term is negligible w.r.t. Af„(-) as well. Furthermore, using (16) and 
(13) we have 

D^,{x)=(^l + h'^Ll^ + o{h')^ lgE[e,|H._i]. (20) 

Consequently, under (PI) the result follows form Lemma 5.1 and assumption 
(Bl), which makes Dn{-) negligible. 

Now, we work under the assumption (P2). Recall (17). We split the stochas- 
tic term there as 

n CO n ^/ N 

i=l 5=2 i=l ^' 

Using orthonormality of the Hermite polynomials, the variance of the latter 
term is 

i=l 9=2 i=l 

where || • || stands for norm with respect to the Gaussian measure. Since 
||if?i(x — •)|| and C(l) ^ hxf{x), we conclude that the leading term in Nn,i{x) 
is 

--^{m{x) - mn{x))-^j^-y2x,. 
f[x) hf{x) n ^ 

This implies 

Var(iV„,i(x)) =0(/i4n-"^). (21) 

The similar consideration is applied to D„ and Nn.2- Using the independence 
of Zi and Ci, we have 

J(0;u)/<;,(u) du = Y.[Kh{x-Xi)] = hf{x) + ^h^f"{x) J u^K{u) du + o{h^). 

(22) 
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This yields 

1 " 

Dn{x)^-yE[e^\m-i]+op{l). (23) 

77 Z J 



n 

i=l 



The leading terms in Nn^2{x) is 

E[L(l;Ci)] 1 

which implies 

Var(iV„,2(a;)) = ©(/i^n""^). (24) 

Consequently, under (P2), the result follows from Lemma 5.1 and assump- 
tion (Bl), which makes !?„(•) negligible, together with (CI), which makes 
7V„,i(a:) +iV,7,2(-) negligible. 

We prove now Proposition 2.3 assuming (P2). Recall (18). It was proven 
before that Nnfi[x), Nn,i{x) and Nn,2{x) are negligible. The first term in the 
Hermite expansion for 2 [x) is 



1 " 
-^E[J(0;Ci)]^Eh|H.-i] 



i=l 



lia+ax > l,thenX;r=iE[ei|'Hi-i]Zi = Op(V^), otherwise X^Li Ehl'^i-il^i = 
Op(ni-("+"-'^)/2). Since for q>0, E[J(g;Ci)] 0(/i), we conclude that D„(x) 
can be written as 

— — -E[J(0; Ci)] ^Ehl^^^-i] + Op(n-("+"-)/2) + Op(n-i/2). 

The first two terms in the Hermite expansion for En{x) are 

1 1 1 " 

©2;^E[i^;7(.T - Xi)] + Q^-^^[Ku{x - Xi)Xi]- ^X,. 

Using (22), and ^ ^ Op(n^"^/^), we conclude that the leading terms 

in the difference N'^ 2(2^) ~ En{x), are 

1 " 

-V (E[e,|H,-i]-£,) 

71 

i=l 

+Op ^ E[e,|H.-i]^ + Op (^"~""^'^ E E[£.|H,-i]^ . (25) 

The first two terms are negligible under the conditions (A) and (B2), respec- 
tively. The last term is negligible on account of (4), which is weaker than (C2). 
Finally, 
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which makes the term 



nhf{. 

negligible on account of condition (4). 
5.1.1 Proof of Lemma 5.1 

Proof of Lemma 5.1. The proof is similar to [19, Lemma 2] and [15, Lemma 3.1] . 
Let i?, = {nhKi)-^/'^Kh{x - X,)e^/^/f{x) and R, = R, - E[R,\T^-i]. From the 
martingale central limit theorem it suffices to show the Lindeberg condition 

n 

^ E [i?,'l{|A,|>5}] ^ for each 6 > 

i=l 

and convergence of conditional variances 



1=1 

Let be the density of £i. As for the Lindeberg condition we have 

n n 
1=1 i=l 

^0, 



(26) 



< a 



^^-'-{|e.|>Ci<5v^} 



where Co = 1/(ki/(x)), Ci = (^/C;^sup A'(.t)) ^ and C2 = Co / A'^ 
As for the conditional variances note first that 

E[Rf\T,-i] = E[i?2| - E [(E[i?,|j:,_i])' 
and note that the second term is of smaller order than the first one. Now, 

1—1 J \ ) 1 



^ / K^{v)fix^vh)dv) iX^{E[e2|j-,_,]_E[e?]} 



Now, the deterministic term in the bracket is asymptotically equal to 1. The 
second part converges to in probability from ergodicity. Consequently, the 
expression (26) is proven. □ 
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5.2 Proof of Propositions 2.5 and 2.7 

Proof. Recall (2), (14) and (15). Under (PI), the result of Proposition 2.5 fol- 
lows from (19), (20). Under (P2), we use the expansion with (19), (23), (21), 
(24). 

As for Proposition 2.7, note that all considerations for N'^2{x) ~ En{x) 
leading to (25) are in fact in L^. Therefore, on account of (5) 

Var (K,2(a;) - E^{x)) - /i^n"" + O . 

The first part is of course negligible w.r.t. the bias term. 
Moreover, writing 

^|:A'.(.-xo(/m/-e,) =^E[i..(.-x,)] (/ m/-e, 

we obtain that the variance contribution of this term is E^[m(Xi)Xi]n~"^ . 

□ 

5.3 Cross validation properties 

Under the condition (E2) one can establish the following moment bounds: 

E[e^e,e,,] = 0{Cov{e„e,,)), (27) 
E[ejeyeiei,] = O {E[e,ej,]E[eiei,]) , (28) 
Cov(e2,e2,) ^ 0(Cov(£j,£jO)- (29) 

5.3.1 Asymptotic expansion for ASE(/i) 
Recall that we work under the condition (PI). Define 

1 " 



Write 



ASE(/i) = /2i + /22 + /23 ^ V T^— - V A"/. - X,) e, 
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Let p{x) = {mf)"{x) — m{x)f"{x). Uniformly over {x : f{x) > 0} we have 



R{x) 



h'^K2 p{x) 



2 fix) 

Using (30) we write the second part as 



0{h\\ + op{l))). 



'22 



(l + op(l)). 



(30) 



so that via E 



/ {p{x))^ / f{x) dx we conclude 



^22 -nl-r 



pHx)_ 
fix) 



dx = op{}i^ I \pn). 



(31) 



Now, if i > 1, are random variables with the same mean, and = A;— E[Ai], 
then we have the following decomposition (which will be used many times): 



i=l i=l 



Typically, in LRD setting, the first part dominates. Bearing in mind the above 
decomposition and since fh is a consistent estimator of /, 



'23 



E 



piX 



/2(X 



i^/U (^1 - X2 



■Y.e,il + opil)) 



^itt4^.K,iX.^X,)s,il + opil)). 



The second term is negligible w.r.t. to the first one. Noting that 



E 



piX 



[PiXi 



i^Kn (Xi ~X2) =h I 



= h / pix) dx + Oih-"), 



we get 



/23=0J^E, 



(32) 
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It remains to deal with J21. Split it as 



n n 

1 = 1 ■' ^ ^' 3,i'=l 



6 jS-j 



E 



1 



(1 + «(!)) 



1 1 



— • -1211 + -1212 ^213- 

Clearly, 

hi 

To deal with /211, define 
and note that 



K^iu) du + 0{h/n). 



-Kh{X,-Xj)KhiX,~Xj,), 



(33) 



E[Ti{h,X2,X3)] = l + h'K2 I r + o{h^). 



(34) 



Split /211 as 



n n 



n n 



i— 1 jJ' = i 



2—1 i.j' = i 



Variance of the last term is proportional to 



iECov 



/ 



^ T,(/i,X„X,0, ^ T,,{h,Xi,Xi,) E[e,e,,siei,]. 



If all six indices are different than the term vanishes. If the indices I, I' are 
different, but i = i', then via (28) we obtain that the variance contribution is 



1 
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li j = I and j' 7^ I', together with i = i' or i ^ i' , respectively, then via (27) the 
variance contribution is, respectively. 



1 



3„^2-C( 



nh nn 



O 



1 



o 



1 



(35) 



22 



li j = I and j' = I' , the contribution is 0(l/(n/i)^). Consequently, via (34) and 
(35), for /211 we have 



2.. " 



^211 ej£j'+op 



Now, for /213, its variance can be written as 



ni+"/2/ii/2 
(36) 



\- ± Cov(s|,4)E 



"1 1 



Using (29), one can verify that the above expression is 
Via (33), (36), (37), 



o 



1 



(37) 



/l2 " 

=1 
3^3' 



Op 



1 



n(i+")/2 



Op 



1 



„l+a/2/il/2 



1 



Furthermore, 



1 " 



Combining this with (31) and (32), we have 



ASE(/i) - — / K^iu) du - 4— / ^i-^ dx - C^n~°' - C^^a/i^n"" / /" 
nh J 47 f(x) ' 



Op 



-Op 



1 



^ni+"/2/ii/2^ 
Consequently, (6) is proven. 



1 



(38) 
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5.3.2 Asymptotic expansion for CV{h) 



Recall that 



71 



i=l 



miji{Xi)Y. 



Note that we may write 



1 



n 



CV{h) 



n 



Y^iY, - mhiX,))^ + Opil/inh)) 



1 2 

ASE(/i) + - y £2 ^ - y (m(A,) - m,,(X,)) + Op(l/(n/i)). 



Here, Op{l/{nh)) comes from replacmg rhi^hi') by TTihi')- The second last term 
is treated in the very same way as we dealt with I23 and /211, see (32) and (36), 
respectively. Therefore, it is 



uniformly over [Bi/iopt, B2/iopt]- 

Acknowledgement 

The work of the first author was supported by a NSERC (Natural Sciences and 
Engineering Research Council of Canada) grant. The work of the second author 
was conducted while being a Postdoctoral Fellow at the University of Ottawa. 

References 

[1] Jan Bcran and Yuanhua Feng. Local polynomial estimation with a 
FARIMA-GARCH error process. Bernoulli, 7(5):733~750, 2001. 

[2] Jan Bcran and Yuanhua Feng. Local polynomial fitting with long- 
memory, short-memory and antipersistent errors. Ann. Inst. Statist. Math., 
54(2):291~311, 2002. 

[3] Istvan Berkes and Lajos Horvath. Asymptotic results for long memory 
LARCH sequences. Ann. Appl. Probab., 13(2):641-668, 2003. 





24 



Bing Cheng and P. M. Robinson. Semiparametric estimation from time 
series with long-range dependence. J. Econometrics, 64(l-2):335-353, 1994. 

Gerda Claeskens and Peter Hall. Effect of dependence on stochastic mea- 
sures of accuracy of density estimators. Ann. Statist., 30(2):431-454, 2002. 

Sandor Csorgo and Jan Mielniczuk. Random-design regression under long- 
range dependent errors. Bernoulli, 5(2):209-224, 1999. 

Sandor Csorgo and Jan Mielniczuk. The smoothing dichotomy in random- 
design regression with long-memory errors based on moving averages. 
Statist. Sinica, 10(3):771-787, 2000. 

Sam Efromovich. How to overcome the curse of long-memory errors. IEEE 
Trans. Inform. Theory, 45(5):1735-1741, 1999. 

Lindas Giraitis, Remigijus Leipus, and Donatas Surgailis. Recent advances 
in ARCH modelling. In Long memory in economics, pages 3-38. Springer, 
Berlin, 2007. 

Peter Hall and Jeffrey D. Hart. Convergence rates in density estimation for 
data from infinite-order moving average processes. Probab. Theory Related 
Fields, 87(2):253-274, 1990. 

Peter Hall and Jeffrey D. Hart. Nonparametric regression with long-range 
dependence. Stochastic Process. AppL, 36(2):339-351, 1990. 

Peter Hall, Soumendra Nath Lahiri, and Jorg Polzehl. On bandwidth choice 
in nonparametric regression with both short- and long-range dependent 
errors. Ann. Statist, 23(6):1921-1936, 1995. 

Peter Hall, Soumendra Nath Lahiri, and Young K. Truong. On band- 
width choice for density estimation with dependent data. Ann. Statist., 
23(6):2241~2263, 1995. 

Hwai-Chung Ho and Tailen Hsing. Limit theorems for functionals of moving 
averages. Ann. Probab., 25(4):1636-1669, 1997. 

Ratal Kulik. Nonparametric deconvolution problem for dependent se- 
quences. Electron. J. Statist, 2:722-740, 2008. 

Ratal Kulik and Marc Raimondo. Wavelet regression in random design 
with heteroscedastic dependent errors. Ann. Statist., 37:3396-3430, 2009. 

Jan Mielniczuk and Wei Biao Wu. On random-design model with dependent 
errors. Statist. Sinica, 14(4):1105-1126, 2004. 

Murad S. Taqqu. Fractional Brownian motion and long-range depen- 
dence. In Theory and applications of long-range dependence, pages 5-38. 
Birkhauser Boston, Boston, MA, 2003. 



25 



[19] Wei Biao Wu and Jan Mielniczuk. Kernel density estimation for linear 
processes. Ann. Statist, 30(5):1441-1459, 2002. 

[20] Yuhong Yang. Nonparamctric regression with dependent errors. Bernoulli, 
7(4):633~655, 2001. 



26 



